Weighted and Unweighted Transducers for Tweet Normalization
نویسندگان
چکیده
We present two simple finite-state transducer based strategies for tweet normalization. One relies on hand-written correction rules designed to capture commonly occurring misspellings and abbreviations, while the other tries to automatically induce an error model from a gold standard corpus of normalized tweets.
منابع مشابه
Generic e-Removal and Input e-Normalization Algorithms for Weighted Transducers
We present a new generic ǫ-removal algorithm for weighted automata and transducers defined over a semiring. The algorithm can be used with any semiring covered by our framework and works with any queue discipline adopted. It can be used in particular in the case of unweighted automata and transducers and weighted automata and transducers defined over the tropical semiring. It is based on a gene...
متن کاملWord Normalization in Twitter Using Finite-state Transducers
This paper presents a linguistic approach based on weighted-finite state transducers for the lexical normalisation of Spanish Twitter messages. The system developed consists of transducers that are applied to out-of-vocabulary tokens. Transducers implement linguistic models of variation that generate sets of candidates according to a lexicon. A statistical language model is used to obtain the m...
متن کاملThe TALP-UPC Approach to Tweet-Norm 2013
This paper describes the methodology used by the TALP-UPC team for the SEPLN 2013 shared task of tweet normalization (Tweet-Norm). The system uses a set of modules that propose different corrections for each out-of-vocabulary word. The final correction is chosen by weighted voting according to each module accuracy.
متن کاملInternship Report Compositions of Extended Top-down Tree Transducers
Many aspects of machine translation of natural languages can be formalized by employing weighted finite-state (string) transducers [22, 40]. Successful implementations based on this wordor phrasebased approach are, for example, the At&t Fsm toolkit [41], Xerox’s finite-state calculus [24], the Rwth toolkit [23], Carmel [19], and OpenFst [2]. However, the phrase-based approach is not expressive ...
متن کاملData-Driven Spelling Correction using Weighted Finite-State Methods
This paper presents two systems for spelling correction formulated as a sequence labeling task. One of the systems is an unstructured classifier and the other one is structured. Both systems are implemented using weighted finite-state methods. The structured system delivers stateof-the-art results on the task of tweet normalization when compared with the recent AliSeTra system introduced by Ege...
متن کامل